我们研究了从单个运动毛发图像中恢复详细运动的挑战性问题。该问题的现有解决方案估算一个单个图像序列,而无需考虑每个区域的运动歧义。因此,结果倾向于收敛到多模式可能性的平均值。在本文中,我们明确说明了这种运动歧义,使我们能够详细地生成多个合理的解决方案。关键思想是引入运动引导表示,这是对仅有四个离散运动方向的2D光流的紧凑量量化。在运动引导的条件下,模糊分解通过使用新型的两阶段分解网络导致了特定的,明确的解决方案。我们提出了一个模糊分解的统一框架,该框架支持各种界面来生成我们的运动指导,包括人类输入,来自相邻视频帧的运动信息以及从视频数据集中学习。关于合成数据集和现实世界数据的广泛实验表明,所提出的框架在定性和定量上优于以前的方法,并且还具有生产物理上合理和多样的解决方案的优点。代码可从https://github.com/zzh-tech/animation-from-blur获得。
translated by 谷歌翻译
滚动快门(RS)失真可以解释为在RS摄像机曝光期间,随着时间的推移从瞬时全局快门(GS)框架中挑选一排像素。这意味着每个即时GS帧的信息部分,依次是嵌入到行依赖性失真中。受到这一事实的启发,我们解决了扭转这一过程的挑战性任务,即从rs失真中的图像中提取未变形的GS框架。但是,由于RS失真与其他因素相结合,例如读数设置以及场景元素与相机的相对速度,因此仅利用临时相邻图像之间的几何相关性的型号,在处理数据中,具有不同的读数设置和动态场景的数据中遭受了不良的通用性。带有相机运动和物体运动。在本文中,我们建议使用双重RS摄像机捕获的一对图像,而不是连续的框架,而RS摄像机则具有相反的RS方向,以完成这项极具挑战性的任务。基于双重反转失真的对称和互补性,我们开发了一种新型的端到端模型,即IFED,以通过卢比时间对速度场的迭代学习来生成双重光流序列。广泛的实验结果表明,IFED优于天真的级联方案,以及利用相邻RS图像的最新艺术品。最重要的是,尽管它在合成数据集上进行了训练,但显示出在从现实世界中的RS扭曲的动态场景图像中检索GS框架序列有效。代码可在https://github.com/zzh-tech/dual-versed-rs上找到。
translated by 谷歌翻译
Target Propagation (TP) is a biologically more plausible algorithm than the error backpropagation (BP) to train deep networks, and improving practicality of TP is an open issue. TP methods require the feedforward and feedback networks to form layer-wise autoencoders for propagating the target values generated at the output layer. However, this causes certain drawbacks; e.g., careful hyperparameter tuning is required to synchronize the feedforward and feedback training, and frequent updates of the feedback path are usually required than that of the feedforward path. Learning of the feedforward and feedback networks is sufficient to make TP methods capable of training, but is having these layer-wise autoencoders a necessary condition for TP to work? We answer this question by presenting Fixed-Weight Difference Target Propagation (FW-DTP) that keeps the feedback weights constant during training. We confirmed that this simple method, which naturally resolves the abovementioned problems of TP, can still deliver informative target values to hidden layers for a given task; indeed, FW-DTP consistently achieves higher test performance than a baseline, the Difference Target Propagation (DTP), on four classification datasets. We also present a novel propagation architecture that explains the exact form of the feedback function of DTP to analyze FW-DTP.
translated by 谷歌翻译
Despite the impact of psychiatric disorders on clinical health, early-stage diagnosis remains a challenge. Machine learning studies have shown that classifiers tend to be overly narrow in the diagnosis prediction task. The overlap between conditions leads to high heterogeneity among participants that is not adequately captured by classification models. To address this issue, normative approaches have surged as an alternative method. By using a generative model to learn the distribution of healthy brain data patterns, we can identify the presence of pathologies as deviations or outliers from the distribution learned by the model. In particular, deep generative models showed great results as normative models to identify neurological lesions in the brain. However, unlike most neurological lesions, psychiatric disorders present subtle changes widespread in several brain regions, making these alterations challenging to identify. In this work, we evaluate the performance of transformer-based normative models to detect subtle brain changes expressed in adolescents and young adults. We trained our model on 3D MRI scans of neurotypical individuals (N=1,765). Then, we obtained the likelihood of neurotypical controls and psychiatric patients with early-stage schizophrenia from an independent dataset (N=93) from the Human Connectome Project. Using the predicted likelihood of the scans as a proxy for a normative score, we obtained an AUROC of 0.82 when assessing the difference between controls and individuals with early-stage schizophrenia. Our approach surpassed recent normative methods based on brain age and Gaussian Process, showing the promising use of deep generative models to help in individualised analyses.
translated by 谷歌翻译
In the field of reinforcement learning, because of the high cost and risk of policy training in the real world, policies are trained in a simulation environment and transferred to the corresponding real-world environment. However, the simulation environment does not perfectly mimic the real-world environment, lead to model misspecification. Multiple studies report significant deterioration of policy performance in a real-world environment. In this study, we focus on scenarios involving a simulation environment with uncertainty parameters and the set of their possible values, called the uncertainty parameter set. The aim is to optimize the worst-case performance on the uncertainty parameter set to guarantee the performance in the corresponding real-world environment. To obtain a policy for the optimization, we propose an off-policy actor-critic approach called the Max-Min Twin Delayed Deep Deterministic Policy Gradient algorithm (M2TD3), which solves a max-min optimization problem using a simultaneous gradient ascent descent approach. Experiments in multi-joint dynamics with contact (MuJoCo) environments show that the proposed method exhibited a worst-case performance superior to several baseline approaches.
translated by 谷歌翻译
使用三维(3D)图像传感器的智能监视一直在智能城市的背景下引起人们的注意。在智能监控中,实施了3D图像传感器获取的点云数据的对象检测,以检测移动物体(例如车辆和行人)以确保道路上的安全性。但是,由于光检测和范围(LIDAR)单元用作3D图像传感器或3D图像传感器的安装位置,因此点云数据的特征是多元化的。尽管迄今已研究了从点云数据进行对象检测的各种深度学习(DL)模型,但尚无研究考虑如何根据点云数据的功能使用多个DL模型。在这项工作中,我们提出了一个基于功能的模型选择框架,该框架通过使用多种DL方法并利用两种人工技术生成的伪不完整的训练数据来创建各种DL模型:采样和噪声添加。它根据在真实环境中获取的点云数据的功能,为对象检测任务选择最合适的DL模型。为了证明提出的框架的有效性,我们使用从KITTI数据集创建的基准数据集比较了多个DL模型的性能,并比较了通过真实室外实验获得的对象检测的示例结果。根据情况,DL模型之间的检测准确性高达32%,这证实了根据情况选择适当的DL模型的重要性。
translated by 谷歌翻译
针对目标的对话任务的先前研究缺乏关键观念,该观念在以目标为导向的人工智能代理的背景下进行了深入研究。在这项研究中,我们提出了目标引导的开放域对话计划(TGCP)任务的任务,以评估神经对话代理是否具有目标对话计划的能力。使用TGCP任务,我们研究了现有检索模型和最新强生成模型的对话计划能力。实验结果揭示了当前技术面临的挑战。
translated by 谷歌翻译
研究过程包括许多决定,例如如何应有资格以及在何处发表论文。在本文中,我们介绍了一个一般框架,以调查此类决策的影响。研究效果的主要困难是我们需要了解反事实结果,而实际上并非现实。我们框架的主要见解是灵感来自现有的反事实分析,其中研究人员将双胞胎视为反事实单位。提出的框架将一对彼此引用为双胞胎的论文。这些论文往往是平行的作品,在类似的主题和类似社区中。我们调查了采用不同决策的双论文,观察这些研究带来的研究影响的进展,并通过这些研究的影响来估算决策的影响。我们发布了我们的代码和数据,我们认为由于数据集缺乏反事实研究,因此这是非常有益的。
translated by 谷歌翻译
我们提出了一种新方法,以正式描述统计推断的要求,并检查程序是否适当使用统计方法。具体而言,我们定义了信仰Hoare逻辑(BHL),以形式化和推理通过假设检验获得的统计信念。对于假设检验的Kripke模型,此程序逻辑是合理的,并且相对完成。我们通过示例证明,BHL对于假设检验中的实际问题有用。在我们的框架中,我们阐明了通过假设检验获得统计信念的先前信念的重要性,并讨论了程序逻辑内外统计推断的全部图片。
translated by 谷歌翻译
基于视频的自动化手术技能评估是协助年轻的外科学员,尤其是在资源贫乏地区的一项有前途的任务。现有作品通常诉诸CNN-LSTM联合框架,该框架对LSTM的长期关系建模在空间汇总的短期CNN功能上。但是,这种做法将不可避免地忽略了空间维度中工具,组织和背景等语义概念之间的差异,从而阻碍了随后的时间关系建模。在本文中,我们提出了一个新型的技能评估框架,视频语义聚合(Visa),该框架发现了不同的语义部分,并将它们汇总在时空维度上。语义部分的明确发现提供了一种解释性的可视化,以帮助理解神经网络的决策。它还使我们能够进一步合并辅助信息,例如运动学数据,以改善表示和性能。与最新方法相比,两个数据集的实验显示了签证的竞争力。源代码可在以下网址获得:bit.ly/miccai2022visa。
translated by 谷歌翻译